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Forward 

The proposed research is to develop Gaussian random fields methods to study fork-join net¬ 
works (FJNs) with synchronization constraints. FJNs arise from many military operations, e.g., 
Army force deployment and counter-terrorism, where commands come from one or multiple types 
of operations and each operation requires multiple parallel and/or sequential tasks to be processed 
in service stations with multiple servers, and to be rejoined for further processing with synchro¬ 
nization constraints, e.g., non-exchangeability. In this research, we focus on the non-exchangeable 
synchronization constraint, which requires that tasks can only be synchronized only if all tasks of 
the same job are completed. The main mathematical challenge lies in the resequencing of arrival 
orders after service completion at each station, which requires an infinite dimensional state space 
to track the status of all parallel tasks for each job. That was an extremely difficult open problem. 

We have developed a novel method using multiparameter sequential empirical processes driven 
by service vectors of parallel tasks of each job to describe the system dynamics of FJNs. This 
research has produced two research papers, focusing on a single class FJN in two asymptotic regimes, 
where the arrival rate of jobs and the number of servers in each station get large appropriately. We 
consider the number of tasks in each waiting buffer for synchronization, jointly with the number 
of tasks in each parallel service station and the number of synchronized jobs. In the first paper, 
we consider the quality-driven regime, and show that all the limiting processes are functionals of 
two independent processes - the limiting arrival process and a generalized Kiefer process driven 
by the service vector of each job. We characterize the transient and stationary distributions of 
the limiting processes. In the second paper, we consider the quality-and-efficiency-driven regime 
(Halfin-Whitt regime), and show that all the limit processes in the functional central limit theorem 
are also characterized via functionals of the initial limit quantities, the arrival limit process and 
a generalized multiparameter Kiefer process driven by the service vectors. This new framework 
is being further generalized to analyze fork-join networks with multiple classes of jobs, and study 
control, reliability and provisioning problems. 
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1 Statement of the Research Problem 


Fork-join networks consist of a set of service stations that serve job requests simultaneously and 
sequentially according to pre-designated deterministic precedence constraints. Such networks have 
many applications in manufacturing and telecommunications [4, 16, 25, 26, 27, 43, 53, 36, 37, 49], 
patient flow analysis in healthcare [22, 1, 2, 57, 58], parallel computing [47, 52, 51, 32], military 
deployment operations [24, 56], and law enforcement systems [29]. Two types of synchronization 
constraints are of particular interest. One is called exchangeable synchronization (ES) in which 
tasks are not tagged with a particular job and can be synchronized for a service completion once 
the necessary tasks are completed. This type of synchronization constraint is often used in manufac¬ 
turing systems; for example, in many assembly systems, different parts of a product are processed 
at separate workstations or plant locations and a product will be assembled once all of its neces¬ 
sary parts are completed. In this case, the parts are not tagged with a particular product, since 
they are standardized for the same type of product. The second type is called non-exchangeable 
synchronization (NES). Tasks are tagged with a particular job and can only be synchronized when 
all the parallel tasks of the same job are completed. 


Fortt Service Stations Unsynchronized Queues 



Figure 1: A fundamental fork-join network 

Fork-join networks with NES are used in many applications, including healthcare systems, 
parallel computing, MapReducing scheduling (e.g., large-scale parallel Web search), disassembly 
and reassembly systems in manufacturing and so on. In patient flows of hospitals [1, 2, 22, 57, 58], 
the treatment and discharge processes are typical examples of fork-join networks with NES: a 
patient must have all test results ready before a doctor examination and these tests are conducted 
in different units/laboratories and can never be mixed; a patient, after the discharge decision is 
made, must wait for necessary procedures, pharmacy, transportation, etc., before being physically 
discharged. In MapReduce scheduling [11, 32, 51, 54], jobs are processed in two phases: in the 
map phase, a large-scale data input (e.g., Web processing data) is distributed into individual 
computation nodes, and each node processes one block of input data, and after the execution of all 
blocks of the same data input, they will be joined as an output in the reduce phase. 

Despite the vast appealing applications of such networks, very little has been known about their 


3 





behaviors in the many-server heavy-traffic regimes. We start considering a fundamental fork-join 
network model with a single class of jobs and NES, where each arriving job is forked into several 
parallel tasks upon arrival and each of the tasks is processed in parallel at a dedicated service 
station with multiple servers under the non-idling FCFS discipline. Upon service completion, each 
task will join a buffer associated with its service station, and wait for synchronization, such that 
each job is synchronized only if all of its tasks have been completed. Figure 1 depicts such a 
model. In this model, in addition to the service dynamics, we are interested in the waiting buffer 
dynamics for synchronization. One important performance measure is the response time of a job, 
namely, the time from arrival to synchronization. The response time may also include the time 
required for the synchronization process, but we do not consider that in this work. Thus, the 
response time includes two delays, waiting time for service and waiting time for synchronization. 
Since each service station can be regarded as a separate many-server queue, the waiting time for 
service has been well understood. However, the waiting time for synchronization, which is our focus 
in this paper, has not been studied. Specifically, we investigate the waiting buffer dynamics for 
synchronization jointly with the service dynamics. 

The main mathematical challenge lies in the resequencing of the arrival orders after service 
completion at each service station, due to the randomness of the service times and the multi-server 
setting. When there is a single server in each of the parallel service station and the service discipline 
is FCFS, the service completion order is preserved to be the same as the arrival order of tasks in 
each service station, so that the two types of synchronization constraints are equivalent. However, 
the arrival order of tasks in each service station can be resequenced at the service completion 
epochs when the number of servers in a service station is larger than one or the service discipline 
is not FCFS. Resequencing has been one of the most difficult obstacles in the study of fork-join 
networks. Some limited work has been dedicated to the study of such challenging problems. For 
example, substantial efforts were dedicated to the study of the max-plus recursions [21, 3, 12]. 
More recently, Atar et al. [2] have studied a fork-join network with single-server service stations 
where tasks may reenter for service at some service stations in a Bernoulli mechanism so that 
the arrival orders of tasks at each service station are resequenced after service completion. They 
show that under a priority discipline, the system dynamics with NES is asymptotically equivalent 
to that with ES in the conventional (single-server) heavy-traffic regime. For a Markovian fork- 
join network with multiple servers, Zviran [58] shows that the system dynamics with NES is also 
asymptotically equivalent to that with ES in the conventional heavy-traffic regime. However, the 
two types of synchronization constraints lead to very different system dynamics when the service 
stations have many parallel servers in the Halfin-Whitt regime, as conjectured in [2, 58]. To the 
best of our knowledge, our work is the first to tackle the resequencing problem in non-Markovian 
fork-join networks with NES and multiple-server service stations in the many-server heavy-traffic 
regimes. We will consider both cases when each service station is operating in the quality-driven 
(QD) regime, or in the quality-and-efficiency-driven (QED, Halfin-Whitt) regimes. 

When all the service stations operate in the QD regime, this is equivalent to a model which has 
infinite numbers of servers at all service stations asymptotically. To describe the system dynamics, 
we can start with a graphical representation as shown in Figure 2(a) for a system of two parallel 
tasks. At each job’s arrival epoch, we mark the arrival time on the horizontal line (x-axis) and 
the service times of all parallel tasks on the vertical line (y-axis). At each time t, by drawing 
a negative forty-five degree line, we can count the numbers of tasks in each service station and 
each waiting buffer for synchronization. When the arrival process is Poisson, we can apply Poisson 
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Figure 2: Graphical representations of the system dynamics in the QD and QED regimes 


random measure theory, similarly as in the “physics” of M/GI/oo queues [14]. It can be shown 
that at each time t, the numbers of tasks in each service station and each waiting buffer for 
synchronization all have Poisson distributions and their parameter values and covariances can also 
be obtained; see Proposition 2.1. However, when the arrival process is more general, this Poisson 
random measure approach does not work, and we cannot obtain the exact distributions for these 
performance measures. Thus, we consider heavy-traffic approximations of the system dynamics 
when the arrival rate is relatively large. For that, the graphical representation in Figure 2(a) also 
plays an important role; see the system’s dynamic equations in §2. 

Here we develop a new approach to describe the system dynamics. Both the service dynamics 
and the waiting buffer dynamics for synchronization are represented as functionals of the mul¬ 
tiparameter sequential empirical process driven by the service vector of all parallel tasks. Their 
diffusion-scaled processes converge weakly to limit processes that can be all represented as function¬ 
als of two independent processes - the limiting arrival process and the multiparameter generalized 
Kiefer process driven by the service vector. When the limiting arrival process is Brownian motion, 
we show that the aforementioned limiting processes are a multidimensional continuous Gaussian 
process, and thus characterize the joint transient and stationary distributions of these processes. 
We also study the impact of the correlation among the service vector upon these distributions. 

There are several advantages with this new approach. It gives a clean and elegant representation 
of the limiting processes, involving only two independent stochastic processes arising from the arrival 
and service processes. Moreover, the characterization of the limiting processes as Gaussian and their 
transient and stationary distributional properties can be easily obtained. Furthermore, this new 
approach paves the way to study the fork-join network with all the service stations operating in the 
QED regime. We believe that this new approach launches a new framework to study more general 
fork-join networks, for example, multiclass models, and when the service vectors for parallel tasks 
form a stationary and weakly dependent sequence. 

When all the service stations are in the QED regime, we exploit the delicate relationship between 
finite-server models and its corresponding infinite-server models. This was exploited to prove 
an FCLT for the GI/GI/n queue by Reed [45]. We make an important observation that the 
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multidimensional processes of the waiting buffer dynamics for synchronization and the service 
dynamics in the fork-join network can be represented through the corresponding processes in the 
infinite-server case. Thus, our results from the QD regime can be extended to establish the FCLT 
for the fork-join network in the QED regime. To illustrate, we can also use a similar graphical 
representation as in Figure 2(a) to describe the system dynamics. In particular, as shown in Figure 
2(b), we mark the entering service times of all parallel tasks for each job on the horizontal line (x- 
axis), and the service times of them on the vertical line (y- axis). However, unlike the infinite-server 
case, tasks of the same job may not enter service simultaneously. Fortunately, it is well known 
that the delay for service in the QED regime is 0{l/yfn)\ see, e.g., [45, 50]. This asymptotically 
negligible difference among entering service times helps us to establish the FCLT for the fork-join 
network in the QED regime. 

An important implication of our results is that the size of the waiting buffer for synchronization 
is of the same order as that of the total number of tasks at each service station, and thus, the waiting 
time for synchronization is of the same order as the service time, 0(1). Namely, the response time 
in the QED regime includes the delay for service 0{l/y/n), the service time 0(1) and the delay 
for synchronization 0(1). It remains to establish the FCLT for the (virtual) waiting time process 
for synchronization. More importantly, it remains to find an optimal scheduling policy that will 
minimize the delay for synchronization in the single-class case. We believe that our methods and 
results will provide useful insights towards that direction. 

In the development of approximations to the fork-join system, we make a fundamental contri¬ 
bution to the study of multiparameter sequential empirical processes driven by random vectors. 
Sequential empirical processes driven by a sequence of random vectors (allowing for correlation 
among random variables in the vector) and their limits as generalized Kiefer processes have been 
studied in the statistics literature; see e.g., [42, 6, 8, 9, 13], but the convergence is proved in the 
space D([0, T] fc ,M) of real-valued cadlag functions defined on [0 ,T] k , k > 2, endowed with the 
generalized Skorohod Ji topology in [35] and [48]. In our setting, it is necessary to prove the con¬ 
vergence in the space D([0,T], D([0,T] fc ,M)) of function-valued cadlag functions defined on [0,T], 
endowed with the standard Skorohod J\ topology for D([0,T] fc ,M)-valued cadlag functions. 

Literature review. Most of the literature on fork-join networks is on models with single-server 
service stations. We only give a brief summary here on relevant work in heavy traffic. These 
studies are in the conventional (single-server) heavy-traffic regime. In Varma’s dissertation [53], 
the diffusion-scaled workload processes and unsynchronized queueing processes in some fork-join 
network models with ES are shown to converge weakly to certain multi-dimensional reflected Brow¬ 
nian motions. The stationary distributions of the system response time and the processes counting 
the number of tasks in unsynchronized queues are specified by some partial differential equations 
(PDEs). Nguyen [36] shows the diffusion-scaled processes counting the queue lengths at each service 
station of a single-class fork-join network model with ES converge to a reflected Brownian motion 
in a polyhedral cone of the nonnegative orthant. Nguyen [37] discusses the difficult challenges with 
multiclass fork-join models with ES. As we have noted above, for a fork-join network with feedback 
and NES, Atar et al. [2] show that a dynamic priority discipline achieves throughput optimal¬ 
ity asymptotically in the conventional heavy-traffic regime, as a consequence of the asymptotic 
equivalence between NES and ES constraints. 

Very little work has been done for fork-join networks with multi-server service stations. Ko 
and Serfozo [25] consider a fork-join network model with a single class of Poisson arrivals and K 
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parallel service stations with multiple servers at each station and exponential service times, and 
obtain an approximation for the distribution of the system response time in equilibrium under 
the NES constraint. Dai [10] provides an exact simulation algorithm to approximate the system 
response time in equilibrium for the same Markovian model in [25] by using a “coupling from the 
past” method. Zviran [58] studies optimal control of multi-server feedforward fork-join networks 
with exponential service times in the conventional heavy-traffic regime and shows that FCFS is 
asymptotically optimal and the resequencing disruption becomes asymptotically negligible. Zaied 
[57] calculates mean offered-load functions of fork-join networks with NES and multiple processing 
stages when the arrival process is time-inhomogeneous Poisson and service times for parallel tasks 
are independent, and studies staffing of time-varying emergency departments and synchronization 
delays under Markovian assumptions. Both dissertations of Zviran [58] and Zaied [57] are motivated 
from applications in patient flow analysis. Gurvich and Ward [17] study optimal matching policies 
for a pure join model (Markovian) with multiple classes of jobs under certain matching constraints. 

This work contributes to the recent development for non-Markovian many-server queueing mod¬ 
els. We only mention those that are most relevant to our work due to the large volume of papers 
on many-server models. Krichagina and Puhalskii [28] first observe that the system dynamics of an 
infinite-server queueing model can be represented by an integral functional of a sequential empirical 
process driven by service times. They show that the diffusion-scaled processes counting the num¬ 
ber of jobs in the system can be approximated by a functional of a standard Kiefer process driven 
by service times. Pang and Whitt [39, 41] generalize that approach to establish two-parameter 
process limits for G/G/oo queues when the service times are i.i.d. and weakly dependent, respec¬ 
tively. Reed [45] and Puhalskii and Reed [44] have observed a relationship between finite-server 
and infinite-server queues and generalized the approach in [28] to obtain the diffusion limits for 
G/GI/N queues in the Halfin-Whitt regime. Mandelbaum and Momcilovic [33] generalize the 
approach by Reed [45] to study G/GI/N + GI queues with abandonment. All these papers use 
sequential empirical processes driven by a sequence of univariate random variables. Our approach 
to study fork-join networks with NES uses multiparameter sequential empirical processes driven by 
a sequence of i.i.d. random vectors and properties of multiparameter processes and martingales. 

Notation Throughout the paper, the following notation will be used. M and M+ (M d and M+, 
respectively) denote sets of real and real non-negative numbers (d-dimensional vectors, respectively, 
d> 2). Z+ is the set of non-negative integers. N denotes the set of natural numbers. For a, b G M, 
we denote a A b := min(a, b) and a V b := max(a, b). For x G M, let x + := max{x, 0} and 
x~ := — min{x', 0}. For any x G M+, [tI is used to denote the largest integer less than or equal to 
x. We use bold letter to denote a vector, e.g., x := (aq, ...,aqv) € ■ 0 denotes the vector whose 

components are all 0. For x,y G M. N , we denote x < y, x > y and x > y in the componentwise 
sense, and let x A y = (aq A y\ , ...,xn A yn)- We use 1(A) to denote the indicator function of a 
set A. The abbreviation a.s. means almost surely. For any univariate distribution function F(-), 
we denote F c (-) = 1 — F(-). For a G M+ and a G M+, we call A a (<5) ( resp. A a (<5)) is a (5-grid of 
[0, aq] x [0,a 2 ] (resp. [0, a]), if A a (S) (resp. A a (<5)) is a finite partition of [0, oq] x [0, a 2 ] (resp. 
[0, a]), where each element of the partition is the rectangle [si,fi) x [s 2 ,i 2 ) (resp. [s,t)), satisfying 
0 < $k < tk < a k for k = 1, 2 (resp. 0 < s < t), and minfc =1;2 (tfc — s*,) > <5 (resp. t — s > 6). For 
two real-valued functions / and g , we write f(x) = 0(g(x )) if lirn siqr x ^ oc \f(x)/g(x)\ < oo. 

All random variables and processes are defined on a common probability space (Q,F,P). For 
any two complete separable metric spaces Si and S 2 , we denote Si x S 2 as their product space, 
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endowed with the maximum metric, i.e., the maximum of two metrics on Si and S 2 . S k is used 
to represent fc-fold product space of any complete and separable metric space S for k <£ N. For a 
complete separable metric space S, B([0, 00 ), S) denotes the space of all S-valued cadlag functions 
on [0, 00 ), and is endowed with the Skorohod J\ topology (see, e.g., [5, 15, 55]). Denote B = 
B([0, oo),M). The space B([0, 00 ),B), denoted as B®, is endowed with the Skorohod J\ topology, 
that is, both inside and outside B spaces are endowed with the Skorohod J\ topology. For a 
complete separable metric space S, the space B([0, oo) 2 , S) is the space of all 5-valued “continuous 
from above with limits from below” functions on [0, oo) 2 , and is endowed with the same metric as 
defined by [18]. ©2 = B([0, l] 2 , R) is denoted as the space of all “continuous from above with limits 
from below” functions on the unit square [0, l] 2 in the sense of Neuhaus [35], and is endowed with 
the same metric d® 2 as in [35]. Weak convergence of probability measures // n to pi will be denoted 
as n n iSfp /j. For a sequence of processes {X n : n > 1} and a process X, we use notation X n =£~ X 
to denote the convergence in finite-dimensional distributions of X n to X. 


2 The Infinite-Server Fork-Join Network Model 

2.1 Model and Assumptions 

In this section, we present a detailed description of our infinite-server fork-join network model and 
the assumptions. As shown in Figure 1, there is a single class of jobs, and each job is forked into K 
parallel tasks, K > 2. Each task is processed in a service station with multiple servers under the 
FCFS discipline. There is an infinite number of servers at each station. After service completion, 
each task will join a waiting buffer for synchronization associated with each service station, and 
when all tasks of the same job are completed, they will be synchronized and leave the system. Here 
we assume that the synchronization process takes zero amount of time. 

Let A := {A(t) : t > 0} be the arrival process of jobs with r, representing the arrival time 

of the i th job, i € N. Let {rj' : i > 1} denote the i.i.d. service time vectors of the parallel 

tasks. The joint distribution of the service time vector for the i th job if is F(x) := F(x\. .... xk) 
for Xk > 0, k = l..... K. Their marginal distributions are F k (x), for x > 0, k = 1..... A". The 
joint distribution of any two service times //* and rf k is Fj tk (xj,x k ) := P(rfj < x 3 . rf k < x k ) for 
Xj,Xk > 0, j,k = Note F ^&(-,-) = F k (-) when j = k for j,k = 1 We denote 

F j, k (xj,x k ) ■■= P(Vj > Xj,rf k > x k ) = \-Fj{xj)-F k (x k )+Fj tk (xj,x k ) for xj,x k >0 ,j,k = 

Note Fj k (-, •) = F k (-) when j = k for j, k = 1,..., K. Let r ]:= maxjr/j,..., rf^} be the maximum 
of the components in the service vector ?f, and F m (x) := P{if m < x) = F(x,...,x) for x > 0. 
(Throughout the paper, we use subscript “m” to index quantities and processes associated with 
the maximum.) The service process is assumed to be independent of the arrivals. We exclude the 
case of perfectly positively correlated parallel services since that will lead to empty waiting buffers 
for synchronization. 

Let X k := {X k (t) : t > 0} be the process counting the number of tasks in service at the service 

station k, and Y k = {Y k (t) : t. > 0} be the process counting the number of tasks in the waiting 

buffer for synchronization (unsynchronized queue) after service completion at service station k, 
k = 1,..., K. Let S := {5(f) : t > 0} be the process counting the number of synchronized jobs and 
D k := {D k (t) : t > 0} be the process counting the number of tasks that have completed service at 















station k, k = 1 Denoted := (X 1 ,...,X K ), Y := {Y 1 ,...,Y K ) and D := (D u ..., D K ). We 

assume that the system starts empty. 

Assuming that the arrival process Aft) is Poisson with rate A, by Poisson random measure 
theory, we can easily obtain the following properties on the processes Xft ), Yft) and Sft) at each 
time t. 

Proposition 2.1. If the arrival process Aft) is Poisson with rate X, then at each time t > 0, for k = 
1 X k ft) has a Poisson distribution with rate X Jq Ff(s)ds, Y k ft) has a Poisson distribution 

with rate X Jq(F^(s) — F if(s))ds, and Sft ) has a Poisson distribution with rate X F in (s)ds. For 
each time t> 0 and j, k = 1, K, 


Cov(Xjft), X k (t)) 

= A / F j,k( s , s ) ds i 

Jo 

(2.1) 

Cov(Yj(t),Y k (t)) 

= X [ (Fj, k (s,s)-F m (s))ds, 

Jo 

(2.2) 

Cov(X j (t),Y k ft)) 

= X [ (F k (s) - F jtk ( Si s))ds. 

Jo 

(2.3) 


For each time t > 0 and k = 1, K, Sft) is independent of X k (t) and Y k ft). When K = 2, Yi(t) 
and Y%0) are independent for each t > 0. 

When the arrival process Aft) is general, we will obtain heavy-traffic limits for the fluid and 
diffusion scaled processes of (X. Y. S) jointly. We will let the arrival rate grow large for the system 
to be in heavy traffic. For that, we consider a sequence of such systems indexed by n and use 
superscript n for the processes A,X,Y,D,S, and the arrival times (r,; : i > 1}, but we let the 
service times {rf : i > 1} and their distribution functions be independent of n. We make the 
following assumption on the arrival process A n . 

Assumption 1: FCLT for arrivals. There exist: (i) a continuous nondecreasing deterministic 
real-valued function a on [0, oo) with a(0) = 0 and (ii) a stochastic process A with continuous sample 
paths, such that 

A n := n _ 5 (A n — na) =^> A in D as n —)• oo. (2.4) 

■ 

It follows from (3.6) that we have the associated FWLLN 
A n 

A n := — =4> d in D as n —)• oo. (2.5) 

n 

When the arrival process is renewal, the limit in (2.5) is aft) = At, for t > 0 and some positive 
constant A, and the limit in (3.6) is A = \JXcfjB a , where cf is the squared coefficient of variation 
(SCV) of an interarrival time, and B a is a standard Brownian motion (BM). 

We also make a regularity assumption on the joint service-time distribution function F(x). 

Assumption 2: Service time distributions. The joint distribution function Ffx) of the 
service time vectors {rf : i G N} is continuous. ■ 

From the graphical representation of the system dynamics in Figure 2(a), we can write, for each 
t > 0 and k = 1,..., K, 

A"(t) 

X k(t) = 1( Fi +r ik > *)» 
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(2.6) 











(2.7) 


Y k (t) = X 1 (ff + rif c <t and r” + r) k , > t for some k' ^ k ) 


= + 4 < 0 - W + '/?m < *|j 

= X + r L > 0 - i(jf + vl> *)) ? 

A n (t) A n (t) 

- s '"( i ) = X + vln < t) = X 1 ( T i l +4^4 VAs), 

A"(t) 

= X 1 ( r r + 4 < 4- 


(2.8) 

(2.9) 


The following balanced equations hold for each t > 0 and fc = 1,.... K. 

DW) = A n (t)-X%(t), (2.10) 

Y k n (t) = D%(t) ~ S n (t). (2.11) 

As we have remarked in the introduction, by previous work on G/GI/oo queues [28], each 
individual process X k and D k ( resp. S n ) can be represented by an integral of a sequential empirical 
process driven by a sequence of i.i.d. random variables {rf k : i > 1} (resp. {rf m : i > 1}) for each 
k = 1,..., K. Thus, Gaussian limits for the diffusion-scaled processes X k , D k and S n in heavy traffic 
for each k can be established, and as a consequence, a Gaussian limit for the diffusion-scaled process 
Y k can be obtained from those of D k and S n , k = 1,.... A'. However, that approach does not give a 
characterization of the joint Gaussian distribution of the limiting processes of the diffusion-scaled 
processes (X n ,Y n . S n ). 

We will represent all the processes X n ,Y n , S n as integrals of a multiparameter sequential empir¬ 
ical process K n := {K n (t,x) : t > 0,x €. K+} driven by the sequence of service vectors {t)‘ : i > 1}: 

1 LntJ 

K n (t,x) := - X 1(4 ^ x ), t> 0, (2-12) 


That is, we write, for t > 0 and k = 1,..., K, 


X k {t) = n [ [ l(s + x k > t)dK n (A n (s),x) , 

Jo Jr* 

Y k (t) = n[ [ (l(s + Xk <t) - l(s + Xj <t, \/j))dK n (A n (s),x) , (2. 

Jo Jrk 


(2.13) 

14) 


S n (t) = n f j l(s + Xj < t, \/j)dK n (A n (s),x) . 
Jo Jr* 


(2.15) 


The integrals in (2.13), (2.14) and (2.15) are well-defined as a Stieltjes integral for functions of 
bounded variation as integrators. 
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2.2 An FCLT for Multiparameter Sequential Empirical Processes 

We present an FCLT for multiparameter sequential empirical processes U n := {U n (t, x) : t > 0, x G 
[0,1]^} driven by a sequence of i.i.d. random vectors with uniform marginals: 

[ntj 

U n (t,x) :=<x)-H(x)), t> 0, iG[0,f, (2.16) 

V n i = i 

where for each i G N, f := (£|, ■ a vector of nonnegative random variables with continuous 

joint distribution function H(-) and uniform marginals over [0,1]. 

The convergence for the processes U n (t,x ) is established in the space B([0, oo), B([0, 1] K , M)). 
We remark that this theorem is in the same spirit as Lemma 3.1 in [28], where an FCLT is proved 
for the two-parameter process U n (t,x) in the univariate case in the space B([0,oo),B([0,1],M)). 
We generalize that result to the multivariate setting. 

Theorem 2.1. The multiparameter sequential empirical processes U n {t,x ) defined in (2.16) con¬ 
verge weakly to a continuous Gaussian limit, 

U n {t,x)^U(t,x) in B([0,oo),B([0,l]*,R)) as n -> oo, (2.17) 

where U(t,x ) is a continuous Gaussian random field with mean function E[U(t, x)\ = 0 and covari¬ 
ance function 

Cov(U(t,x),U(s,y)) = (tAs)(H(xAy)-H(x)H(y)), t,s> 0, z, ye [0,1]*. 

To show the FCLT for the processes [X n ,Y n , S n ), we define the diffusion-scaled multiparameter 
sequential empirical processes K n := { K n (t,x ) : t > 0, x £ by 

1 lnti 

K n (t,x) := —j= ^2 (l(»7* < *) — F(x)) , t > 0, x G M*. (2-18) 

v n i=1 

Theorem 2.1 can be applied to show an FCLT for the processes K n (t,x). Define F : —>■ [0,1]* 
with F[x) = {F\ (x'i), Fk{xk))- By Sklar’s theorem [46], for any multivariate distribution func¬ 
tion F, there exists a unique multivariate distribution function H (called “copula”) with uni¬ 
form marginals on [0,1] such that F(x) = H(F(x)) when the marginal distribution functions iq c , 
k = 1,..., K, are continuous. Then, K n (-,-) can be represented as a composition of U n (-,-) with 
F(-) in the second component, i.e., 

K n {t , x) = U n {t,F(x)), t > 0, x G K*. 

Thus, it follows from Theorem 2.1 that the processes K n (t,x ) converge in distribution: 

k n (t,x) = U n (t,F(x)) => K(t,x) := U(t,F(x)) in B([0,oo),B^) as n —)• oo, (2.19) 
which implies that 

K n (t,x) ^ k(t,x) := tF(x) in B([0, oo),B.R-) as n —>• oo. (2.20) 
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2.3 FWLLN and FCLT 

We define fluid-scaled processes X n , Y n and S n by 


X n := 1 X n , Y n :=-Y n , S n := 1 S n . (2.21) 

n n n 

The FWLLN for (X n ,Y n , S n ) is stated in the following theorem. 

Theorem 2.2 (FWLLN). Under Assumptions 1 and 2, the fluid-scaled processes converge to 
deterministic fluid functions, 

(. A n , X n ,Y n , S n ) =* (a, X, Y, S) (2.22) 


in H) 2K+2 as n —)■ oo, where the limits are all deterministic functions: a is the limit in (2.5), for 
each t > 0, 


X(t):=(XxC t),...,X K (t)% X k (t) := f Ff(t-s)dd(s), for k = l,...,K, (2.23) 

Jo 

Y(t) := (Y\{t), ...,Y K (t)), Y k (t) := f'(F^(t - s) - Ff{t - s))dd(s), for k = l,...,K, (2.24) 
Jo 

S(i) := [ F m (t — s)da(s). 

Jo 


(2.25) 


When a(t) = A t for a constant arrival rate A > 0 and < oo for k = 1, 


X k (oo) := lim X k (t) = A E[r)\\, k= 1,..., K, 

Y k ( oo) := lim Y k (t) = \(E[rii} - E[r,l \), k = 1, 

t-»- oo t 

We define the diffusion scaling of X" , V n and S n by 

X n :=V^(X n -X), Y n := y/n(Y n — Y), S n := y/fl(S n - S). 
We will show the following FCLT for these diffusion-scaled processes. 


(2.26) 

(2.27) 

(2.28) 


(2.29) 


Theorem 2.3 (FCLT). Under Assumptions 1 and 2, the diffusion-scaled processes converge in 
distribution, 

(. A n , K n ,X n ,Y n , S n ) => {A, K, X. Y, S) (2.30) 

in D x D([0, oo),Bx) x B 2i ^ +1 as n —>• oo, where A is the limit in (3.6), K is the limit in (2.19), 
which is independent of A, and for t > 0 and k = 1,..., K, 

X(t) := M 1 (t)+M 2 (t), Mflt) := (Mi flt),..., M Kji (t)), i = 1,2, 

M k ,i(t) := / F£(t - s)dA(s) = A(t) - [ A(s)dFf(t — s), 

Jo Jo 
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(2.31) 

(2.32) 









Mk,i(t) 

:= / / l(s + x k > t)dk(a(s),x) = - / / l(s + x k < t)dK(a(s),x), 

Jo JmF Jo JmF 

(2.33) 

S(t) 

:= Vi(t)^V 2 (t), 


(2.34) 

Vi (t) 

:= [ F m (t — s)dA(s) = — f A(s)dF m (t — s), 

Jo Jo 


(2.35) 

v 2 (t) 

:= f f l(.s — Xj < t, \/j)dK(a(s),x), 

Jo Jr 


(2.36) 

Y(t) 

:= Z 1 (t) + Z 2 (t), Zi(t):=(Z^(t),...,Z Kti (t)), * = 1,2, 


(2.37) 

Z k ,i(t ) 

:= [ (F k (t - s) - F m (t - s))dA(s) = [ A(s)d(F m (t - s) 

Jo Jo 

- F k (t-s )), 

(2.38) 

Zkfi(t) 

■= [ [ (l(s + x k < t) - l(s + Xj <t, \/j))dK(a(s),x) 

Jo Jr £ 

= -M k , 2 (t) - V 2 (t)( 2.39) 


The processes M 2 , Z 2 and V 2 are defined in the mean-square sense. This is in the same way 
as the limit process with respect to a standard Kiefer process for the G/GI/oo queue is defined in 
[28, 39]. The limit processes are characterized in the next subsection. 

2.4 Characterization of the Limit Processes 

In this section, we show the Gaussian property of the limiting processes (X,Y) and S when the 
arrival limit process is a Brownian motion. 

Theorem 2.4 (Gaussian Property). Under Assumptions 1 and 2, when the arrival limit process 
A is a Brownian motion, i.e., A(t) = c a B a (a(t )) for a standard Brownian motion B a , a positive 
constant c a > 0 and t > 0, the limiting processes (X, Y) and S in Theorem 2.3 are well-defined 
continuous Gaussian processes. For each t> 0, 

(X(t),Y(t)) = N(0,Y(t)), and S(t) = N(0,a s (t)), 
where for j, k = 1,..., K, 

af k (t) := Cov(Xj(t),X k (t)) = jf -s,t-s) + (c 2 a - 1 )Ff{t - s)Ff(t - s)] dd(s), (2.40) 

Ojkfc) : = Cov{Yj{t),Y k (t )) = J [(Fj^t - s,t- s) - F m (t - s)) 

+ (c 2 a - l)(Fj(t -s)- F m (t - s))(F k (t -s)- F m (t - a))] dd(s), (2.41) 
af k Y (t) := Cov(X 3 (t),Y k (t)) = £ [ (F k (t - s) - F hk (t -s,t- s )) 

+ (cl - 1) (Ff(t - s)(F k (t - s) - F m (t - s ))) j da(s ), (2.42) 

and 

o s (t) := Var(S(t )) = [ F m (t - s)dd(s) + (c 2 a - 1) [ (F m (t - s)) 2 dd(s). (2.43) 

Jo Jo 
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When the arrival rate function aft) = A t for a positive constant A > 0, 


(X(t),Y(t)) => (X(oo),Y(oo)) = N(0,E(oo)) as t -> oo, 
^lim t~ 1 Var(S(t)) = Ac„, 


ere for j, k 

= 1 ,-,K, 




af k {oo) 

° 8 

+ 


[°° F*(s)Ff(s)ds, 

'o 

(2.44) 

aJ k {oo) 

= A J~[{F j , k (s, 8 )-F a 

^)) + ( 

cl - 1 KFjis) - F m (s))(F k (s) - F m (s))]ds, 

(2.45) 

°fk Y (°°) 

:=A J™ [(F k (s) - F jtk ( 

s)) + 

(c 2 a - 1) (FC(s)(F k (s) - F m (s)))]ds. 

(2.46) 

We make 

the following remarks or 

l the Gaussian property of the limiting processes. 



(i) When we set cl = 1, the variance and covariance formulas coincide with those in the Poisson 
arrival case in Proposition 2.1. 

(ii) When K = 2 and = 1, Cov(Yj(t),Y k (t )) = 0, for t > 0 and k,j = 1, ...,# with k ^ j, 
even if the service times of parallel tasks are correlated, since both terms inside the integral 
in (2.41) vanish. 

(iii) We emphasize the interesting structure of the variances of X k and Y k and their covariances, 
k = 1..... K. Recall that for G/GI/oo queues [28], the steady-state variance formula of the 
number of jobs in the system is given as the sum of two terms, the mean and the coefficient 
(c„ — 1) multiplying an integral associated with the service time distribution; for example, 
when E[r)j j.] < oo, the variance of the steady-state number of tasks in the k th service station 

Var(X k (oo)) = A E[r)l\ + A (c 2 a - 1) f ( Ff(s)) 2 ds , k = 1,..., K. 

Jo 

It turns out that the steady-state variance formula for the number of tasks in the waiting buffer 
for synchronization has the same structure; for instance, when E[rj^\ < oo for k = 1 
the variance of the steady-state waiting buffer size at the k th service station is 

Var(Y k ( oo)) = A (Etfj - E[r,l}) + A(c^ - 1) f°(i£( S ) - Ff(s)) 2 ds , k = 1,..., K. 

Jo 

The same structure also exists for the covariances between Xj and Y k , as shown in (2.42), for 
k,j = 1 

(iv) The synchronized process does not have a Brownian motion limit, but its limiting process is 
Gaussian, and has the same variability as the arrival process when the arrival rate is constant. 
This can be also explained by regarding the synchronized process as the departure process 
of a G/GI/oo queue with the same arrival process and service times as the maximum of the 
service vectors (see [28, 39]). 
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To explore the impact of the correlation among the service times of each job’s parallel tasks 
on the system dynamics, we consider the case when the service vector rf has the joint continuous 
distribution function ^ 

F(x) = (1 - p) G(x k ) + pG ^ = min^{x fc }^ (2.47) 

with a marginal continuous distribution function G(-), for 0 < p < 1, x k > 0 and k = \..... K. 
Namely, the service times at the parallel stations have the same distribution, and are symmetrically 
correlated with a correlation parameter p G [0,1). We state the mean and covariance functions of 
the performance measures studied above as functions of the parameter p in the following corollary. 

Corollary 2.1. Under the same assumptions in Theorem 2.4, when the service vector r) 1 has the 
joint distribution function F in (2.47), for each t > 0 and k = 1,..., K, X k (t) and V ar(X k (t )) are 
the same as in (2.23) and (2.40), respectively, 

Y k (t) = (1 - p) [ [Gift - s)(l - (Gift - s)) K_1 )] da(s), 

Jo 

Var(Y k (t )) = J* [(1 - p)G(t - s)( 1 - (G(t - s ))^" 1 ) 

+ (1 - p) 2 (c 2 a ~ 1 )(G(t - s )) 2 (l - (G(t - s)) K_1 ) 2 ] dd(s), 

Cov(X k (t),Y k (t)) = (c 2 a - 1)(1 - p) [ [G c (t - s)G(t - s)( 1 - (G(t - s))^ 1 )] dd(s), 

Jo 

for j, k = 1,..., K and j ^ k, 

Cov(Xj(t),X k (t)) = J [(1 - p)(G c (t - s)f + pG c (t - s) + (cl - 1 )(G c (t - s)) 2 ]dd(s), 
Cov(Yj(t),Y k (t)) = j* [(1 - p)(G(t - s)) 2 (l - (G(t - s)) K ~ 2 ) 

+ (1 - P) 2 (c 2 a - 1 )(G(t - s)) 2 (1 - (G(t - s))*- 1 ) 2 ] dd(s), 

Cov(Xj(t),Y k (t )) = (l-p)J [G(t - s)G c (t - s ) 

+ (c 2 a - 1 )G c (t - s)G(t - s) (1 - (G(t - } dd(s ), 
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We make several remarks on the impact of the correlation among the service vector. The mean 
and the variance of X k {t) are not affected by the correlation, but the covariances of Xj(t) and X k (t) 
increase linearly in p for t > 0 and j,k = 1, ...K with j ^ k. The mean of Y k (t) decreases linearly in 
p and the mean of S(t) increases linearly in p for t > 0 and k = 1..... Ah The covariances of Yj(t) and 
Yfc(f) decrease in p. in the order of (1 — p) 2 , but the covariances of X 3 (t) and Y k (t) decrease linearly 
in p for t > 0 and j, k = 1, ...K. The variance of S(t) increases in p. in the order of p 2 . for t > 0. 
The intuitive interpretation for these observations is that positive correlation makes the parallel 
tasks more likely to finish close to each other so that the waiting time for synchronization becomes 
less and more jobs are synchronized. It is also important to emphasize that the covariances of Yj-(t) 
and Yp(t) and the covariances of Xj(t) and Y k {t) decrease in different orders in the correlation 
parameter p for t > 0 and j, k = 1, ...K. The same observations hold for the associated steady-state 
performance measures. 


2.5 Comparison with a fork-join network with ES 

We make a comparison with an associated fork-join network with ES. We use superscript “ES” in 
the corresponding processes for this model. Let the arrival and service processes be the same as 
the model described above. The only difference is the synchronization constraint. Here tasks are 
not tagged with a particular job, so that whenever there are tasks completed at all parallel service 
stations, the oldest completed task at each waiting buffer for synchronization will be synchronized. 
It is evident that when the arrival process A(t) is Poisson, the processes Y ES (t) and S ES (t ) do not 
have a Poisson distribution at each time t > 0, k = 1,.... K. In this case, for each k = 1,..., K, 
X^’ ES and d^’ ES will have the same representations as in (2.6) and (2.9), but the processes S n,ES 
and y£’ ES become 

S n ’ ES (t ) = min K {D? ES {t)} r t > 0, (2.48) 

and 

yp ES (t) = o;- ES (t) - s"’ ES (t) = Di ES ( t ) - ,g| K {D? ES m, *> o. (2.49) 

Thus, at any time, one of the waiting buffers for synchronization should be empty. It is evident that 
the processes S n,ES and Y^" ES cannot be represented as a single integral of the multiparameter 
sequential empirical process K n as in equations (2.15) and (2.14), respectively. 

We now discuss more on the comparison for the steady-state mean values of the fluid limits of 
these processes when the arrival rate is constant. In the ES model, the synchronization process 
S ES can be represented as the minimum of the departure processes from all parallel stations, and 
these departure processes are dependent due to the correlation of service vector of each job. Thus, 
we are unable to obtain a distributional approximation of the processes S ES and Y ES , k = 1,..., K. 
However, for each t > 0, by applying the previous results on G/GI/oo queues [28], we can obtain 
the mean values of the fluid limit Y ES (t), k = 1,K, and S ES (t ): 

Y ES (t) := A ^ F k (s)ds - ^nni. -^(sjdsjj (2.50) 

—> Yf S (oo) := A |ma^{E[r/)]} - as t -> 00 , 
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S ES (t):= A min { [ Fj(s)ds\ = Xt — X max { [ Ef(s)dsl , and lim ——— = A. 
i<j<K \J 0 J i<j<K U„ J J t 

(2.51) 

Recall that the steady-state mean value of the waiting buffer for synchronization in our model 
F fc (oo) = A (£[r^] - Efr 1 ]) in (2.27), k = 1..... if, denoted as Y^ ES { oo) for the comparison 
purpose. It is evident that the average waiting buffer sizes for synchronization under NES constraint 
are larger than those under ES constraint, even though the total synchronization throughput rates 
are the same, limt-^ S ES (t)/t = lim^oo S NES (t)/t = A. We also observe that when the parallel 
service times are perfectly positively correlated, the difference Y£ fES (oo) — Y ES ( oo) becomes zero 
for k = 1,..., K. We summarize this comparison result in the following proposition. 

Proposition 2.2. Under Assumptions 1 and 2, when a(t) = Xt for a positive arrival rate A > 0 
and E[r)l] < oo for k = 1,..., K, 

Y^ es (oo) — Y ES ( oo) = A (E[rjm] — ^max ^{E[r /j]}) > 0, for k = 1,..., K. (2.52) 

By the extreme value theory, if the service vector has i.i.d. components such that the service 
time distribution lies in the domain of attraction for Gumbel extremal distribution, then we have 
ax(i ?m — ^k) => Z as K —)• oo, where Z has a Gumbel distribution, and uk and bx are constants 
depending on K\ see Chapter 1 in [31]. The Gumbel distribution has cdf P(Z < z) = e~ e *, z > 0, 
with mean E[Z] = 7 « 0.5772, the Euler-Mascheroni constant, and variance Var(Z) = 7 t /\/6 rj 
1.2825. For one example, if the service vector has i.i.d. components of an exponential distribution 
with rate 1, then ax = 1 and bx = In if (see Example 1.7.2 of [31]), for k = 1, ...,if, 

Y^ ES { 00 ) - Yf s ( 00 ) = A ^2 \ - ^ ~ A ( ln (^) - 1) as K -> 00 . (2.53) 

For another example, if the service vector has i.i.d. components of a lognormal distribution 
LN(0, 1), we have, for k = 1, ...,if, 

Yk fES (°°) ~ ^f S (°°) ~ ^{l/ a K + bx — e 1 / 2 ) as K —>■ 00 , (2-54) 

where ax and bx are (see Example 1.7.4 of [31]): 

ax = (2 In if ) 1 / 2 exp j — (2 In if ) 1 / 2 + 0.5(2 In if ) -1 / 2 (In In if + ln( 47 r))| , 

and 

bx = exp | (2 In if ) 1 / 2 — 0.5(21nif) _1 / 2 (lnlnif + ln( 47 r))| . 


2.6 Numerical Example 

In this section, we provide a numerical example with two parallel tasks (if = 2), comparing our 
approximations with simulations. We let the arrival process be renewal with arrival rate A = 100 
and the SCV c 2 = 5. The service times of the two parallel tasks are assumed to be a bivariate 
Marshall-Olkin hyperexponential distribution, which is a mixture of two independent bivariate 
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Marshall-Olkin exponential distributions [34], A bivariate Marshall-Olkin exponential distribution 
function Fmo(x,v) for the random vector (X,Y) can be written as F^j 0 (x,y) := P(X > x,Y > 
y) = exp(— p\x — /r 2 y — //i 2 (x V y)), x,y > 0, where three parameters pi, p 2 , P 12 are such that the 
two marginals are exponential with rates pi + pv 2 and /j 2 + P 12 and their correlation p = p%% / (pi + 
P 2 + P 12 ) € [0,1]. We denote M0(X\, A 2 , p) for a bivariate Marshall-Olkin exponential distribution, 
where Ai and A 2 are the rates for the marginals, and p is the correlation parameter, for which the 
parameters pi = (Ai — pA 2 )/(l + p), P 2 = (A 2 — pAi)/(l + p) and /x i2 = (p(Ai + A 2 ))/(l + p). In the 
numerical example, we take a mixture of MO{ 4/5,1, p\) with probability 0.4 and MO{ 6/5,6/5, p 2 ) 
with probability 0 . 6 , such that the means of the two hyperexponential marginals are m s ,\ = 1 and 
rn s /2 = 0.9. By setting p\ = p 2 = 0, we have two independent parallel service times, and by setting 
pi = 0.7 and p 2 = 172/679, we obtain that the correlation (see the correlation formula in §5.2 [40]) 
between the two parallel service times is equal to 0.5. 

In Table 1, we show the approximation values for the mean, variance and covariance of X & 
and Yfc, for k = 1,2, and compare them with the corresponding simulated values. To estimate 
the simulated values, we simulated the system up to time 40 with 4000 independent replications 
starting with an empty system, which we call one experiment. In each replication, we collected data 
over the time interval [20,40] and formed the time average (the system tends to be in steady state 
in less than 5 time units). We conducted 5 independent experiments and took sample averages as 
estimations for simulated values. To construct the 95% confidence interval (Cl), we used Student 
f-distribution with four degrees of freedom. The halfwidth of the 95% Cl is 2. 776s§/\/5, where S 5 
is the sample deviation. 

We make several remarks for the numerical example. First, our approximations match very well 
with the simulated values. Second, the size of waiting buffers for synchronization is quite large, of 
the same order as the number of tasks in the service stations. Third, we find that when the two 
parallel tasks are positively correlated, the mean and the variance of X^s are the same as those 
in the independent case, while the covariance between X\ and X 2 gets larger, the mean and the 
variance and covariances of iys and the covariances between X/,. and Yj become smaller than those 
in the independent case, j, k = 1,2. These are also consistent with the observations in Corollary 
2.1. Note that this numerical example is more general than that considered in Corollary 2.1. 


3 The multi-server fork-join network model 

3.1 Model and Assumptions 

In this section, we present a detailed description of our multi-server fork-join network model. We 
consider a fork-join network with a single class of jobs, and each job is forked into K (K > 1) 
parallel tasks. Each task is processed in a service station with finite servers under the non-idling 
FCFS discipline. Namely, a newly arriving task immediately gets served if there is an idle server in 
that station, and joins the back of the queue otherwise, and the task waiting for the longest in the 
queue enters service as soon as a server in that station becomes available. After service completion, 
each task will join a waiting buffer for synchronization associated with each service station, and 
when all tasks of the same job are completed, they will be synchronized and leave the system. Here 
we assume that the synchronization process takes zero amount of time. 

Let A := {A(t) : t > 0} be the arrival process of jobs after time 0. Let r, be the arrival time of 
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Table 1: Comparing approximations with simulations in a stationary model 


(Xi,X 2 ) 

(£[X a p|%j) 

(y^(A 1 ),Ear(A 2 )> 

Cov(X i,X 2 ) 

P = 0 

Sim. (95% CL) 

(99.99 ± 0.17 , 89.98 ± 0.12) 

(296.26 ± 0.66, 269.46 ± 0.70) 

234.14 ± 0.66 

Approx. 

(100.00, 90.00) 

(296.00, 269.27) 

233.99 

p = 0.5 

Sim. (95% CL) 

(99.98 ± 0.04, 89.99 ± 0.04) 

(296.08 ± 0.57, 269.23 ± 0.80) 

256.34 ± 0.43 

Approx. 

(100.00, 90.00) 

(296.00, 269.27) 

256.30 


(T 1 W 2 ) 

(£[¥,], E[Y 2 \) 

(V ar(Yi), Var(Y 2 )) 

CovtY^Yi) 

P = o 

Sim. (95% CI.) 

(43.18 ± 0.05 , 53.20 ± 0.10) 

(70.12 ± 0.20, 89.85 ± 0.40) 

31.53 ± 0.30 

Approx. 

(43.20, 53.20) 

(70.31, 90.08) 

31.55 

p = 0.5 

Sim. (95% CI.) 

(20.89 ± 0.01, 30.88 ± 0.02) 

(27.14 ± 0.15, 42.23 ± 0.35) 

8.36 ± 0.07 

Approx. 

(20.89, 30.89) 

(27.05, 42.23) 

8.31 


(X,Y) 

CoviX^YT) 

CW(Xi, Y 2 ) 

CoviX^YT) 

Cov(X 2 ,Y 2 ) 

P = o 

Sim. (95% CI.) 

60.80 (± 0.59) 

122.87 (± 0.61) 

99.21 (± 0.42) 

64.56 (± 0.54) 

Approx. 

61.09 

123.10 

99.85 

64.57 

p = 0.5 

Sim. (95% CI.) 

28.72 (± 0.33) 

68.37 (± 0.73) 

47.51 (± 0.42) 

34.49 (± 0.44) 

Approx. 

28.67 

68.37 

47.41 

34.44 


the f th job, ieN, that is, Aft ) = max{j > 1 : fi < t} for t > 0 and >1(0) = 0. Let 7V fe be the number 
of servers at service station k, k = 1,.... K. Each job brings in a A-dimensional service vector, 
representing the service time at each service station, which can be correlated. Let rf := {rf , rf K ) 
be the service vector of the job that arrives at time Tj, j e N, where rf k is the service time at the 
k th service station. We assume that the sequence {rf : i > 1} is i.i.d., and let the joint distribution 
function of rf be F(x) = F{x \,..., xk) for > 0, k = K. Let F c (x) := P(rj\ > X\,...,rf K > 
xk), for x'i,..., xk > 0. Their marginal distributions are Fpf) with mean 1 /ftp € (0, oo), for 
k = 1,..., K. Let rf m := maxjr/j,..., rf K } and F m {x) := P{rf m < x) = P(rjj < x,Vj) = F(x,...,x) 
for x > 0. (Throughout this paper, we use “m” to index quantities and processes associated with 
the maximum.) We make a regularity assumption on the service time distributions for the parallel 
tasks. 

Assumption 1. The joint distribution function F(x) of the service time vector rf 1 , i € N. is 
continuous. 

State Descriptors. Let X & := {X^ft) : t > 0} be the process counting the number of tasks at 
the service station k , and := {Yy.(t) : t > 0} be the process counting the number of tasks in 
the waiting buffer for synchronization (unsynchronized queue) after service completion at service 

station k, k = 1. K. Denote X := (X 1: ...,Xk) and Y := (Vj,..., Yk). Let S := {S(t) : t > 0} 

be the process counting the number of synchronized jobs by each time t > 0. In addition, let 
Qk '■= { Qk(t ) : t > 0} and B} ; := {B^ft) : k > 0} be the processes representing the queue length 
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and the number of tasks in service at station k , respectively, k = 1, K. Let D k := {D k (t) : t > 0} 
be the cumulative service completion (departure) process at service station k, k = 1, K. Denote 
Q '■= ( Qi, ■■■, Qk ), B := (Bi, ...,B k ), and D := (Dx, ..., D K ). 

A Sequence of Systems. We consider a sequence of the above fork-join networks, indexed by 
superscript n and let n —> oo. We assume that each service station is operating in the many-server 
heavy-traffic asymptotic regimes, where the arrival rate of jobs and the number of servers get large 
appropriately while the service time distributions are fixed. In establishing the FLLN, we allow 
the arrival rate to be time-dependent. In establishing the FCLT, we will assume that each service 
station is operating in the Halfin-Whitt (QED) regime, so that it is criticaliy loaded with a constant 
arrival rate (see Assumption 4 for the precise definition). For any process X, we use X n to represent 


the associated process in the sequence of the fork-join networks. 

Some Fundamental Flow Balance Equations. For each service station k, k = 1,..., K, and for 
each t > 0, we have the foilowing flow conservation equations: 

Xl{t) = Bt{t) + Q n k (t), (3.1) 

Xm = X k( 0 ) + An (t) ~ D W), ( 3 - 2 ) 

Yk(t) = n"(0) + D n k (t) ~ S n (t). (3.3) 

The non-idling condition implies that for each k = 1,..., K and t > 0, 

Bk(t) = xm A Nf, Ql(t) = (xm - N%) + . (3.4) 

In addition, we have the following flow balance equation, for each k, k' = 1,..., K, k ^ k', and t> 0, 

xm + Y k n (t) = x%(t) + Y$(t), (3.5) 


that is, the total numbers of tasks in each service station and its associated waiting buffer for 
synchronization are equal at all time, and are equal to the total number of jobs in the system. 

3.2 Fluid Limit 

In this section, we present the fluid limit for the fork-join network. We assume that the system 
starts from empty and allow the arrival rate to be time-dependent. 

Assumption 2. There exists a continuous nondecreasing deterministic real-valued function aft) 
on [0, oo) with a(0) = 0 such that 

A n (t) := n~ l A n (t) => a(t) in B as n —»• oo. (3.6) 


We also make the following assumption on the numbers of servers. 

Assumption 3. For k = 1, ...,K, N k := N k /n —> N k >0 as n —>■ oo. 

Under the empty initial condition, we can write the processes X k (t), Yfft). k = 1,..., K, and 
S n (t) as 


A n (t) 

x m = + w k % + r ik> t )i *> 0 , 


(3.7) 
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A n (t) 

Y k (t) = ^ 1 (t” + u;]?’* + rf k <t, rf + + 77 ^./ > t, for some k' k), t > 0, (3.8) 

A n (t) 

S n (t)= J2 l(r? + + rii < t, Vk = l,...,K), t> 0, (3.9) 

where w k ’ 1 is the waiting time of the ?' t,h arrival at station k, i G N. 

In addition, for k = 1 let E k (t) be the number of tasks that have entered service at 

station k by time t, t > 0, and set E k := {E k (t) : t > 0}. Denote E n := (E 1 ”,... ,E £). For each 
service station k = 1, K, we also have the balance equation 

Em = A n (t ) - Ql{t) = A n (t) - (X£(t) - N k ) + , t > 0. (3.10) 

Define the fluid-scaled processes X n := n~ l X n for X n = X n ,Y n , S n , E n ,Q n , B n , D n . We now 

state the FLLN for the fluid-scaled processes. 

Theorem 3.1. Under Assumptions 1-3, 

{.A n ,X n ,Y n ,S n ,E n ,Q n ,B n ,D n ) => ( a,X,Y,S,E,Q,B,D) (3.11) 

in B 6K + 2 as n ^ 00 , where the limits are all deterministic functions: a is the limit in (3.6), 
(. E,X,Y , S) is the unique solution to the following: for t> 0 and k = 1,..., K, 

X k (t) = f Ff(t - s)dd(s) + f\x k (t -s)- N k ) + dF k (s), (3.12) 

Jo Jo 

E k (t) = d(t)-(X k (t)-N k ) + , 

S(t) = l - J (^ k mm K {B h (t-s k )} \ dF(si,...,s K ), 

Y k (t) = r F k {t - s)da(s) - [\x k (t - 8) - N k )+dF k (s) - S(t), 

Jo Jo 

and the limits Q, B and D satisfy 

Q k (t) = (X k (t)-N k )+, B k {t) = X k {t) A N k , D k (t) = d(t)-X k (t). (3.16) 

It is easy to check that for each k = 1,..., K, the limit X k (t) also satisfies the following equation: 

X k (t) = a(t)-[ E k (t - s)dF k (s), t > 0. (3-17) 

Jo 

When a(t) = X(s)ds and the service times are exponential (independent or dependent), where 
A(-) is a positive function, for each k = 1,..., K, the fluid limit X k in (3.12) and (3.17) becomes an 
ordinary differential equation (ODE) [38], but the fluid limit Y k in (3.15) does not have an ODE 
representation. We remark that the fluid limit X k for each k = 1,..., K depends only on the marginal 
distribution F k , while the fluid limits Y k , k = 1. K, and S depend on the joint distribution F. 


(3.13) 

(3.14) 

(3.15) 
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However, as the FCLT (Theorem 3.2) below shows, the limits for all these processes in the diffusion 
scale will depend on the joint distribution F. 

When the arrival rate is constant and each service station is underloaded or critically loaded, we 
give a corollary on the steady states of the fluid limits. The proof follows from a direct calculation 
and is omitted. It is evident that correlation among service times of parallel tasks only affects the 
steady state of Y but not that of X. 

Corollary 3.1. Under Assumptions 1-3, if the arrival rate is constant, a(t ) = At, for A satisfying 
0 < A < N k p k for all k = 1,..., K, 

(X(t), Y(t),Q(t),B(t)) -4 (X(oo), Y(oo),Q(oo),B(oo)) as t -4 oo, 

and 

j(D(t),E(t),S(t))-> A:=(A,...,A) as t -> oo, 

where 


X k (oo) = B k (oo) = XE[rjl] = X/p k , Y k (oo) = X(E[rjl n \ — E[ri k ]), Q k { oo) = 0. 


3.2.1 Numerical Examples 

We give two numerical examples to show the effectiveness of fluid approximations comparing with 
simulations, when K = 2. We let the arrival process be Poisson with time-varying rate A (t) = 
200 + 120sin(t), t > 0. The numbers of servers in stations 1 and 2 are N\ = 300 and N 2 = 340, 
respectively. In the first numerical example, the service times of the two parallel tasks are assumed 
to have a bivariate Marshall-Olkin exponential distribution [34]. For our first numerical example, 
we set the service times to be MO(l,0.9,p) such that the service times of the two parallel tasks 
have exponential marginals with means 1 and 10/9 in stations 1 and 2, respectively, and their 
correlation is p. The numerical results with p = 0 and p = 0.5 are provided in Figure 3(a), marked 
with “ind.” and “corr.”, respectively. In the second numerical example, we let the service times of 
the two parallel tasks have a bivariate Marshall-Olkin hyperexponential distribution [40], which is 
a mixture of two independent bivariate Marshall-Olkin exponential distributions. Specifically, we 
take a mixture of MO{ 4/5,1, p\ ) with probability 0.4 and MO( 6/5,27/32, p 2 ) with probability 0.6, 
such that the two parallel service times have hyperexponential marginals with the same means as 
the first example. By setting pi = P 2 = 0, we have two independent parallel service times, and 
by setting pi = 0.7 and p 2 = 521/1232, we get the correlation between the two parallel service 
times to be 0.5. In Figure 3(b), we show the numerical results with p = 0 (“ind.”) and p = 0.5 
(“corr.”). To calculate the simulated values, we simulated the system up to time 20 with 500 
independent replications starting with an empty system. We make two remarks from numerical 
results. First, the fluid approximations match very well with the simulated results. Second, the 
positive correlation among parallel service times does not affect X k , but reduces Y k , for k = 1,2. 


3.3 FCLT in the Halfin-Whitt regime 

In this section, we study the fork-join network with NES in the Halfin-Whitt regime, which requires 
that each service station operates in a critically loaded regime asymptotically. Specifically, we 
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Figure 3: Comparison of fluid approximations with simulations. 

assume the following. Let X n be the arrival rate of jobs such that X n := X n /n —)■ A > 0 as n —»■ oo, 
and set N% := nN k , where N k E N, and p k := X n /(p k N]f) for each k = 1,..., A . 

Assumption 4. For each k = 1,..., K, X = N k pk and y/n( 1 — pj?) —>• > 0, as n —)• oo. 

The arrival processes A n = (A n (f) : t > 0} satisfy an FCLT. 

Assumption 5. There exists a stochastic process A with continuous sample paths satisfying 

A n (t) := A ^ ^ st* Ait) in D as n —)■ oo. (3.18) 

\Jn 

It follows from (3.18) that we have the associated FLLN: 

A n (t ) =>- At in D as n —>■ oo. (3.19) 

We now describe the non-empty initial conditions. Due to the complexity from initial conditions, 
we focus on the case of A = 2, but our approach can be extended to A >2. For convenience, we 
use the notation k' to denote its counterpart, i.e., k' = 1 {k' = 2, respectively) if k = 2 {k = 1, 
respectively), for k = 1,2. At time 0—, there are (0) tasks at service station k, and Yff( 0) tasks 
in its associated waiting buffer for synchronization, for k = 1,2. Let X n (()) := (A”(0), Ag (0)) and 
F n (0) := (yf(0),y 2 n (0)). Recall the flow balance equation (3.5). At time 0—, 

X£(0) + Y?( 0) = A£(0) + y fc 7(0), k = 1,2, (3.20) 

which is equal to the number of jobs in the system. Note that X£(0) > Yff(Q) for each k = 1,2, 
since tasks in the waiting buffer associated with station k' for synchronization must be in station 
k, either in service or in queue. Let B%( 0) := min(X£(0), N£) and Q k (0) := (A^(0) — NJf) + be the 
number of tasks in service (busy servers) and the queue length at station k at time 0—, respectively, 
k = 1,2. We also assume that 1)11(0) < B%( 0) for k = 1,2. This is not a restrictive assumption, 
because in the Halfin-Whitt regime, waiting times for service at each station are 0(1/ y/n) and 
service times are 0(1), and jobs that have completed tasks in one station and joined its waiting 
buffer for synchronization have their associated tasks receiving service in the other station with 
probability one asymptotically. 
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Let J n { 0) := min fe= |.2{-B[?(0) — Y$( 0)} be the number of jobs whose both tasks are in service 
at time 0—. Then Zj}( 0) := Bj?(0) — 0) — -J n (0) represents the number of jobs in the system at 

time 0— whose task k is in service but whose task k' is in queue waiting for service, k = 1,2. Let 
/ n (0) := Qi(0)AQ%(0) be the number of jobs (if any) whose both tasks are in queue at their service 
stations at time 0—. Then R k ( 0) := Q k ( 0) — I n ( 0) represents the number of jobs (if any) whose 
task k is waiting in queue for service while whose task k! is in service, k = 1,2. (Note that our 
assumption above implies that if a job is waiting in queue at station k, its parallel task can be either 
in queue or in service at station k '.) By our definition, we can see that Zj?( 0) = R k ,(0). k = 1,2. Set 
R n (0) := (R™(0), 1?2 (0)) and Z n ( 0) := (Z™(0), Zg (0)). We also obtain a decomposition for Xj?(0): 

xm = BU 0) + Qm = Yg( 0) + J n (0) + Z£( 0) + I n (0) + R n k ( 0), fc = 1,2. (3.21) 

We let {w^ 1 : i = 1,..., Q£(0)} be the sequence of remaining waiting times of the tasks in 
station k at time 0—, k = 1,2. It is in the order of their positions in queue: w^’ 1 is the remaining 
waiting time of the task in the front of the queue while is that for the task in the end 

of the queue at station k at time 0—, k = 1,2. Let {?% : i = 1,..., i?£(0)} be the sequence 
of remaining service times of the tasks in station k at time 0—, for k = 1,2. Let : i = 

1,..., Qfc(0)} be the sequence of service times of the tasks in station k that are in queue at time 
0—, k = 1,2. Without abuse of notation, we use {f ) lXk : i = 1,..., Y^(O)}, {fj k J : i = 1,..., J n (0)} 
and {ffe Z : i = 1,..., Zg( 0)}, which are partitioning subsets of {ff k : i = 1,..., B k (())}, to represent 
the remaining service times of the tasks in station k at time 0— corresponding to the quantities 
y fc "(0), J n (0) and Z k { 0), respectively, k = 1,2. Similarly, we use {w^’ 1 ’ 1 : i = l,...,/ n (0)} and 
{w™’ l ’ R : i = 1,..., R%(0)}, which are partitioning subsets of {w^ 1 : i = 1,..., Q k ( 0)}, to represent the 
remaining waiting times of the tasks in station k at time 0— corresponding to the quantities I n ( 0) 
and R k (0), respectively, k = 1,2. Finally, we use {r/^ 1 : i = 1,.... /”(())} and {rfe R : i = 1..... /?(!;(())}, 
which are partitioning subsets of {rfa® : i = 1,.... Q k (0)}, to represent the service times of the tasks 
in station k corresponding to the quantities I n ( 0) and R k ( 0) in queue at time 0—, respectively, 
k = 1,2. We assume that these initial quantities are independent of the arrival process A n and the 
service times of new arrivals after time 0. 

We can now give a representation for the processes X n , Y n and S n : for f > 0 and k = 1,2, 

B£( 0) Qfc (0) A n (t) 

X k(M = > £) + + + rfk > (3.22) 

Tf( 0) n"(0) J n ( 0 ) 

s n (t ) = J2 x ^ Yx ^ *) + E 1 (4 y * < t) + E 1 (^ ,J ^ ( 3 - 23 ) 

z«(o) m o) 

+ ™2 i,R + r fc R < t) + J2 1 ( t5 l’ i, ' R + Vi R < U?k Z < t) 

I n (0) A"(t) 

+ E + rf. 1 < t, Vj) + E l(rf + w™’ 1 + rfj < 

4=1 4=1 

and 

*?(*) = i?(0) + *£(0) + A n (t) - Xm - S n (t). (3.24) 
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We use the convention that Yli=i — 0 throughout the paper. 

We impose the following assumptions on the initial quantities. 

Assumption 6. There exists (Yi(0), y 2 (0)) G M 2 such that 

(A n (0),y n (0)) :=n- 1 (JC n (0),y n (0)) =► (X(0),F(0)) in M 4 as n -4 oo, 

whereX( 0) := (JVi,iV 2 ) andY( 0) := (Vi(0). y 2 (0)). There exist random vectorsX (0) := (Xl(0), X 2 (0)) G 
M 2 andY( 0) := (W(0), T 2 (0)) G M 2 such that 

(X n (0),y n (0)) := v^(^ n (0)-X(0),y n (0)-F(0))^ (jf(0),y(0)) in M 4 as n —> 00 . 


This assumption implies that the associated fluid-scaled initial quantities 
(J n (0),Z n (0),7 n (0),iT(0)) := n _1 (I 11 (0) , Z n (0), I n (0) , R n (0)) => (7(0), Z(0), 7(0), .R(O)) 
in M 6 as n — > 00 , where 


j(0) := Ai-y 2 (0) =iV 2 -yi(0), Z( 0) := (Zi(0),Z 2 (0)) := (0,0), 7(0) := 0, R( 0) := (0,0). 
Dehne the associated diffusion-scaled quantities (J n (0), Z n (0), 7 n (0), 4(0)) by 


J n (0):= 


J n (0) - nJ(0) 


4 n (0) : 


yfn 


I n (0):= 


/ W (Q) 


yfn ’ ^ 

Then Assumption 6 implies that 

(>(0),Z n (0),7 n (0),4(0)) => (j(0),Z(0),7(0),fl(0)) in 

where 


Rm : 


. igg(O) 

yfn 


-- 1,2. 


7(0) := min{-(l fc (0))--y fc ,(0)}, 
7(0) := min (A fe (0))+, 

Let 

n„W:= E 

be the equilibrium distribution associated ' 


Z k ( 0) := -(X fc (0))- - TH0) - 7(0), k = 1,2, 
4(0) := (4(0)) + -/(0), 7 = 1,2. 



F k ,k= 1,2. 


Assumption 7. Fork = 1,2, {r/jj. : i G N} is a sequence of i.i.d. random variables with distribution 
Fk, e and for each i G N, fj\ and fj 2 are independent. : i € N} is a sequence of i.i.d. random 

variables with distribution F fc for each i G N and k = 1,2. {( 7 ^’ ,r )2 ) : i G N} is a sequence of i.i.d. 
random vectors with a joint distribution F(-, •). {{r) l f R ,Vk?) :i G N} is a sequence of i.i.d. random 
vectors with independent components, k = 1,2. 

Finally, we also make an assumption for the residual waiting times {uff 1 ■ i = 1,Q£(0)}, 
7 = 1,2. 
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Assumption 8. The residual waiting times of the tasks in queue {wjf 1 : i = 1, Q k ( 0)} ; k = 1, 2, 
converge to zero a.s. as n —»• oo. 

We define the diffusion-scaled processes A := (Xf, Xg), y" := (Yf. Y 2 n ) and S n by 


SM, g (():= ^>WW s»(t): = S - OO, 

V n v n v n 

for fc = 1 , 2 , where 


5 n (f) := raS°(t) + A n f f ((t - sj) A (t — s 2 )) dF(s u s 2 ), 

Jo Jo 

S°(t) := Y 2 (0)F 1>e (t) + Fi( 0 )F 2 je (t) + J(Q)Fi {e (t)F 2 , e (t), 

*?(*) :=nY fc (0) + A"t-S"(t). 

Prom the balance equation for Y fe n in (3.24), we can rewrite Y fe " as 

Y k n (t) = Y k n (0) + X%(0) + A n (t)-Xj;(t)-S n (t), t> 0, A: = 1,2. 


(3.25) 

(3.26) 

(3.27) 

(3.28) 

(3.29) 


Recall E k (t) is defined as the cumulative number of tasks entering service by time t > 0 at 
station k, k = 1,2, assuming the system starts empty in §3.2. Without abuse of notation, in §3.3 
related to the FCLT, we let E k (t) be the number of new arrivals after time 0 whose task k has 
entered service by time t > 0 at station k, k = 1 , 2 . 

Define the diffusion-scaled processes (E n ,Q n , , D™), E ” := (E^E^), Q" := ( Qi,Q 2 ), B n := 

(B^B^) and D := (_£>”, ZT?), by 

m) - quo - ww) + . mo ■.= -(!?(())-, 

DE{t):=l t ”(a) + i”(e)-x?(tk t> o, * = 1,2. (3.30) 


For si, s 2 > 0, let 

£ n (si,s 2 ) : = ^ ((-^"(si) A F 2 (s 2 )) - A"(si A s 2 )) 

= (£1 (si) + (A n /\/n)(s! - si A s 2 )) A (E%(s 2 ) + (A n /y/n){s 2 — s\ A .33)). (3.31) 

Before we present the FCLT for the fork-join network with NES in the Halfin-Whitt regime, 
we provide some preliminaries for the limit processes. The limit processes will be functionals of 
a generalized multiparameter Kiefer process, as a limit of the multiparameter sequential empirical 
process driven by the service time vectors of new arrivals. Define the multiparameter sequential 
empirical processes K n := {K n (ti,t 2 ,x) : t\ > 0, t 2 > 0,® € R+} by 

|ntij A |n*2j 

£*{t 1} t 2 ,x) :=-j= J2 (W<x)-F(x)). (3.32) 

We prove the convergence of K n in the space B([0, oo) 2 , D([0, oo) 2 , R)) endowed with a generalized 
Skorohod J\ topology defined in [18] in Proposition 3.1. 
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Proposition 3.1. Under Assumption 1, 


K n (ti,t 2 ,x) => K(ti,t 2 ,x) in D([0, oo) 2 ,B([0, oo) 2 ,M)) as n ^ oo, (3.33) 

where K{t\,t 2 ,x) is a continuous Gaussian random field, called a generalized multiparameter Kiefer 
process, with mean E[K(t\,t 2 ,x)] = 0 and covariance function 

Cov(K(si, S 2 ,x),K(ti,t 2 ,y )) = (si A s 2 A t\ A t 2 ){F(x Ay) - F(x)F(y)), (3.34) 

for s k ,t k > 0. k — 1,2. and x,y € R+. 

We define the processes W k := { W *(<) : t > 0}, Wf := {W£(t) : f > 0} and W := {W{t) : t > 0} 
as integral functionals of K\ for t > 0, k = 1,2, 

■Wik(t) := f f [ l{s k + x k < t)dk(Xs l ,Xs 2 ,x) i (3.35) 

Jo Jo 

W(t) := f [ [ l{ Sj + Xj <t,Vj)dk(Xs 1 ,Xs 2 ,x), (3.36) 

Jo Jo Jr\ 

and 

wm := W k (t ) - W{t) =['[*[ 1 (s k + x k < t,s k ,+x k , > t)dk(X Sl ,Xs 2 ,x), (3.37) 
Jo Jo J 

where the integrals are defined in the sense of mean-square limits (see the precise definition in §??). 

Proposition 3.2. The processes W k , Wf and W are well-defined continuous Gaussian processes 
with mean zero, and for 0 < s < t and k = 1,2, 

E[(W k (t) - W k (s)) 2 } = X f ( F k (t - u) - F k (s - «))(1 - F k (t - u) + F k (s - u))du, 

Jo 


E[(W(t ) - W(s)) 2 ] = A f f [A F((s - Sl ,s-s 2 y,(t-si,t- s 2 ))] 

Jo Jo 

x [1 - A F((s — sx,s — 52); (t - si, t - s 2 ))]d(si A s 2 ), (3.38) 


E[(Wf(t) - W c k {s)) 2 } = E[(W k {t) - W k (s)) 2 } + E[(W(t) - W( S )) 2 ] 

-2X [ [ [F(t-si,t-s 2 )~ F ktk '(s-s k ,t-s k >) 

Jo Jo 

+ ( F k (t - s k ) - F k (s - s k ))(F(s - Si, s - s 2 ) - F(t - si, t - s 2 ))]d(si A s 2 ), 


and covariance functions 


Cov(W k (t),W k 


'(*)) = X f f[F{t 
Jo Jo 


- Si, t - s 2 ) - F k (t - s k )F k i(t - s fc /)]d(si A s 2 ), 


27 



Cov(W k (t),W k ,(t)) = A [ [ [F k (t-s k )F(t-si,t-s 2 ) - F k (t-s k )F k r(t-s k ')]d(si As 2 ) ; 
Jo Jo 


Cov{W k {t), W(t )) = A f f[F(t hr 51, t - S2 ) - F fc (t - s fe )F(f - si, t - s 2 )]d(s 1 A s 2 ), 

Jo Jo 

Cou(W fe c (t),W(t)) = A [ [ [(F(t-ai,t-sz)f-F k (t-8 k )F{t-s 1 ,t-s 2 )]d{8 l ti82), 

Jo Jo 

where Fk,k'{x, y) := P(r/^. < x, rf k , < y) for x,y G M+, and 

AF(x-,y) := F{y 1 ,y 2 ) - F{x 1 ,y 2 ) - F{y 1 ,x 2 ) + F(x u x 2 ), i,yGK^, x < y. 

In addition, let U := {U(t) : t € R+} be a continuous two-parameter Gaussian process with 
mean zero and covariance function: 

Cov(U(s), U(t )) = (Fi, e (ai A f a )F 2 , e (s 2 A f 2 ) - F^s^F^s^F^tOF^ f 2 )), (3.39) 

for s := (si, s 2 ) G M+ and t := (ti,f 2 ) G M 2 . Define U k := {U k (t) : t > 0}, for k = 1,2, by 

Ui(t) := U(t, oo), U 2 {t) := 17(oo, f), i > 0, (3.40) 

and without abuse of notation, we denote U(t) = U(t,t), t > 0. Note that the processes W k , Wf 
and W are independent with U, as well as U k , k = 1,2. 

We are now ready to state the FCLT. 

Theorem 3.2. Under Assumptions 1 and 4-8, 

=> (AXyJ.eAb.d) (sai) 

in B 14 as n —»• oo, where A is in (3.18), X, Y and S are the unique solutions to the following set 
of stochastic integral equations: for t > 0 and k = 1,2, 

X k (t) = X%(t ) - N k p k F kje (t ) - J(Q) l/2 U k (t\ - Y k i (0) 1/2 B 0) fc (F k;e (t)) 

+ [\x k (t - s))+dF k (s) + f Ff(t - s)dA(s) - W k (t), (3.42) 

Jo Jo 

Y k {t) = Yg(t) + N k /3 k F k>e (t ) - Y k {0) 1/2 Bo, k \F k , te {t)) + J(f)f^0 k {t) - U(t )) 

- [\x k (t - s)) + dF k (s ) + T F fc (t - s)di(a) + W fe c (f) - ®(i), (3.43) 

Jo Jo 

5(f) = 5°(f) + T 2 (0)^ 2 s 0!l (F 1>e (f)) + yi(o) 1/2 s 0 , 2 (F 2) e(f)) + J(o) I / 2 ij(fj + w(f) + *(t), 

(3.44) 


and E n , Q”, B n and D are given as follows: 

E k (t) = A(t) - (X k (t))+, D k (t) = X k (0) + A(t) - X k (t), (3.45) 


28 





where 


Qk(t) = ( x k (t)) + , B k (t ) = ~{x k (t))~, 


X° k (t) := X k (0)F ke (t) + (X k (0)) + (Ff(t) - F£ e (f)), (3.46) 

2 

S°(t) := J2&k'(0)F k , e (t) + Z k ,(0)F k (t)F k ,, e (t)) + J(0)F he (t)F 2 , 6 (t) +|(0 )F m (t), (3.47) 

k= 1 

Yg{t) := Y k (0) + X k (0)F k ,e(t) + ( X k (0)) + (F k (t ) - F fc , e (t)) - S°(t), (3.48) 

f/ie processes B o k := {B 0 k (t) : t > 0}, k = 1,2, are independent standard Brownian bridges, the 
process U is a continuous two-parameter Gaussian process defined above with the processes U\ and 
U 2 defined in (3.40), and the processes W k , W k and W are defined in (3.35), (3.37) and (3.36), 
and Bo,k is independent ofU and W k , W k and W, and the process 4/ := {4 i(t ) :t> 0} defined by 

if(t) := ( [ £(t - si,t - s 2 )dF(s!,s 2 ), (3.49) 

Jo Jo 

is a well-defined continuous process, where, for s\,s 2 > 0, 

£\s i. s 2 ) := Fi(si)l(si < s 2 ) + E 2 (s 2 )l(s 2 < si) + (-Ei(si) A F 2 (s2))1(si = s 2 )- (3.50) 

We remark that the limit processes X k , k = 1,2, have the same structure as the unique solution 
to an integral convolution equation, as shown in Reed [45], but are also different because they are 
both driven by the same generalized multiparameter Kiefer process K defined in Proposition 3.1. 
These two limiting processes X k , k = 1,2, are correlated because of the correlated service times of 
the parallel tasks of each job, which is captured by the process K , as well as the same arrival limit 
process A. In fact, these two processes K and A as well as the limits associated with the initial 
quantities are the driving stochastic components of all the limit processes in (3.42)-(3.45). 


4 Concluding Remarks and Future Work 

We remark on the main ideas of the proofs for the limit theorems due to space constraint. The 
main difficulty in the study of many-server fork-join networks with NES is the resequencing of 
arrival orders after service completion at each service station. Tasks of distinct jobs must be 
differentiated and tracked in order to describe the waiting buffer dynamics for synchronization. To 
mathematically describe the system dynamics, we develop a new approach using multiparameter 
sequential empirical processes driven by service vectors for parallel tasks of each job, as depicted 
in Figure 2. This approach is used to establish FLLNs and FCLTs for the waiting buffer processes 
for synchronization and the service processes jointly in the fundamental fork-join network where 
all service stations are operating in the many-server heavy-traffic regimes. 

As a prerequisite, we first establish a new FCLT for multiparameter sequential empirical pro¬ 
cesses driven by random vectors (Theorem 2.1). To prove Theorem 2.1, we employ the standard 
approach of establishing convergence of finite-dimensional distributions and tightness [?, 20, 55]. 
The convergence of finite-dimensional distributions follows from the strong convergence result of 
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multiparameter empirical processes in [42], To prove tightness, we present a new decomposition 
property for multiparameter sequential empirical processes, which have a multiparameter mar¬ 
tingale [23, 19], and a second term of finite variation. We apply properties of multiparameter 
martingales [23, 19] and strong approximations of random walks by Brownian motions (see section 
3.5 in [30]) to show the tightness of those two decomposed terms, respectively. This decomposition 
also plays a very important role in proving the tightness of the number of tasks in each waiting 
buffer for synchronization, the number of tasks in each parallel service station and the number 
of synchronized jobs. Specifically, the aforementioned processes can be decomposed into a linear 
combination of three terms: an integral functional of the arrival process and two other terms from 
the decomposition of the multiparameter sequential empirical process. We apply Aldous’ tightness 
criteria (see, e.g., Lemma 3.7 in [28]) and another tightness criteria for processes with proper decom¬ 
positions satisfying certain conditions (Lemma VI.3.32 in [20]) to verify the tightness property of 
the two terms related to the sequential empirical process driven by the service vector, respectively. 

The proofs of the limit theorems in the QD regime can be regarded as generalizations of those 
for G/GI/oo queues in [28]. However, since all the processes, X, Y and S, are represented via 
the multiparameter sequential empirical processes driven by the service vectors, many technical 
challenges must be addressed in the multiparameter setting, for example, using multiparameter L 2 
martingales, and mean-square limits of (integral functionals of) multiparameter processes defined 
on R k (k > 2). One important advantage of our new approach is that all the diffusion-scale limit 
processes for X, Y and S are all functionals of two independent processes - the arrival limit and 
the multiparameter generalized Kiefer process driven by the service vector (Theorem 2.3). From 
that, the characterization of the joint transient and stationary distributions of these processes is 
made possible (Theorem 2.4). 

The proofs in the QED regime are based on the important observations that the system dy¬ 
namics of G/GI/n queues can be represented via the corresponding G/GI/oo service dynamics 
[45], and that waiting times in the QED regime are 0(1/yfn) while service times are 0(1). For 
the fork-join network, we represent the dynamics of X, Y and S via that in the corresponding 
infinite-server fork-join network where the entering service times in the model are regarded as the 
“arrival” times for the corresponding infinite-server fork-join network, as shown in Figure 2(b). 
The observation that the entering service times in the parallel stations have a difference of order 
0(1/y/n) is key to prove the joint convergence of the aforementioned processes. On the other 
hand, since we have to simultaneously handle the waiting times of all parallel tasks and work with 
multiparameter sequential empirical processes, we must develop new techniques to prove tightness, 
including establishing new properties for multiparameter L 2 martingales, and identifying a new 
multivariate integral mapping to apply the continuous mapping theorem. 

We believe that a general framework has been developed to study fork-join networks with 
NES in the many-server heavy-traffic regimes (QD and QED). It can be potentially used to study 
performance evaluation, capacity allocation, and control problems in multi-class fork-join networks 
under NES with multi-stage processing. We want to find optimal scheduling and routing policies 
such that delays for synchronization as well as delays for service can be minimized, particularly, 
reducing delays for synchronization to be of a smaller order than service. We also want to find 
optimal staffing policies to stabilize delays for synchronization in addition to delays for service 
when arrival rates are time inhomogeneous. Our methods can be extended to investigate reliability 
of many-server fork-join networks under NES in random environments (e.g., service disruptions). 
Fork-join networks with NES are more likely to suffer from service disruptions due to the structural 
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complexity of parallel and sequential task processing. Component-level unreliability can be much 
more amplified by its large scale. We will extend our approach to investigate the impact of service 
disruptions in one or multiple service stations upon system congestion, particularly, delays for 
synchronization and throughput. 
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